Normalization and Differential Gene Expression Analysis of Microarray Data
نویسنده
چکیده
DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of mRNA-sequences. Analysis of microarray data involves many different steps such as image analysis, background correction, and normaliza-tion, but also more classical statistical analysis such as testing for significant differences between groups of arrays. The work presented in this thesis is focused on Affymetrix GeneChip arrays and deals with normalization and the problem of finding differentially expressed genes. Normalization of microarray data is essential to allow between-array comparisons. A procedure called Contrast Normalization is proposed and compared with existing methods together with two additional presented methods, Cyclic-Loess and Quantile Normalization. All three presented methods improve on the performance of the existing methods with a slight edge for Quantile Normalization. The quality of microarray data often varies between arrays. A model called WAME has been proposed, using a global covariance matrix to account for differing variances and array-to-array correlations, and thus WAME defines a weighted analysis for finding differentially expressed genes. This thesis presents two new methods for estimating the covariance matrix. Both methods show superior computer run-time over the existing method. Moreover, the second proposed method greatly reduces the bias of the existing method when used on simulated data with regulated genes, although to a less degree for real data with many regulated genes. Microarray data frequently shows a dependency between variability and intensity level which is ignored by the majority of moderated t-tests. The WAME model is extended to incorporate this dependency, and two locally moderated t-tests are proposed, Probe level Locally moderated Weighted median-t (PLW), and Locally Moderated Weighted-t (LMW). When compared with 12 existing methods on 5 spike-in data sets, the PLW method produces the most accurate ranking of regulated genes in 4 out of the 5 data sets, whereas LMW consistently performs better than all (globally) moderated t-tests. Empirical Bayes models for multiple probe type arrays at the probe level. Submitted to BMC Bioinformatics v vi Contents 1 Introduction 1 2 Background 1 2. References 28 vii Acknowledgements Truly, " no man is an island " , and indeed these lines would never have been written without the encouragement and guidance I have received from quite a few people. My adviser during this period as a PhD student, the third and presumable the last period, has been Mats Rudemo. Always positive, and with great competence, Mats have together with my co-adviser Petter Mostad …
منابع مشابه
Global gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملGlobal gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملBayesian Differential Analysis of Gene Expression Data
This paper describes a novel Bayesian method for the differential analysis of large scale gene expression data. The novelty of the method is the use of a contamination model that integrates the different sources of variability that affect gene expression data measured with microarray technology, thus removing the need for arbitrary normalization.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007